title: SeqPlots author: Przemyslaw Stempor framework: impressjs #highlighter: highlight.js # {highlight.js, prettify, highlight} #hitheme: tomorrow
#mode: draft mode: selfcontained — #s .slide x:0 y:-2000 z:0
Przemyslaw Stempor @ Graduate Seminar Series, 17 November 2015
— #big x:0 y:-1250 rot:0 z:100 scale:1 .hide The BIG question!
— #q x:0 y:-700 z:0 rot:0 scale:2 .cent What are the relationships between chromatin features, underlying DNA sequence and gene regulation?
— #qf1v2 x:-300 y:50 z:0 rot:0 scale:1.2Source: http://www.cliffsnotes.com/assets/24452.jpg
— #qf2 x:700 y:50 z:0 rot:0 scale:1.2
— #tools x:0 y:900 z:0 rot:0 scale:2
— #t1 x:-350 y:1200 z:0 rot:0 scale:1.2 - ChIP-seq - RNA-seq - DNase-seq - MNase-seq - ATAC-seq
— #rad x:-350 y:1650 z:0 rot:0 scale:1 .red
— #problem x:300 y:1100 z:0 rot:0 roty:90 scale:1.2 .sm .hide Technical problem – typical experiment produce tens of millions of such positions over hundreds of millions to billions possible locations (base pairs) in the genome!
— #solution1 x:300 y:1450 z:0 rot:0 scale:1 Solutions: - Shrink/simplify the data so they are small enough for us to understand (e.g. peak calls, unsupervised machine learning)
— #solution2 x:300 y:1750 z:0 rot:0 scale:1 - Use data visualization to make an original comprehensible to us
— #vis x:0 y:2400 z:0 rot:0 scale:2 ## Scientific data visualization – is it important?
— #v1 x:0 y:2650 z:0 rot:0 scale:2 .sm - Data visualization is prevalent approach in science
— #v1a x:-1000 y:2650 z:300 roty:90 scale:1 .md Shaded matrix display from Loua (1873).
— #v2 x:0 y:3000 z:0 rot:0 scale:2.0 .sm - Since the advent of sequencing techniques there is great advance in methods specific to this field - Helps us to better understand the data and find the patterns that might be lost due to shrinkage/simplification - Great for exploratory data analyses - Very useful for results presentation
— #stat x:0 y:3900 z:0 rot:0 scale:3 .sm .hide ## We can visualize reads directly, but usually more useful is converting them to a read coverage
— #statp1 x:-300 y:4600 z:200 rot:0 scale:1.5Source: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html
— #statp2 x:100 y:5200 z:0 rot:0 scale:1.5
— #o3 x:0 y:4600 z:1200 rot:0 scale:1
— #global x:3000 y:-2300 z:0 scale:2 ## *-seq data visualization:
global approaches.
— #circos x:3000 y:-1700 z:0 scale:1.2
— #gb x:3000 y:-900 z:0 scale:2 ## *-seq data visualization:
genome browsers.
— #ucsc x:2670 y:-500 z:0 scale:1 UCSC Genome Browser
— #igv x:3350 y:-500 z:0 scale:1 IGV (Broad Institute)
— #gbo x:3000 y:-400 z:600 scale:1
— #trf x:3000 y:2400 z:0 scale:2 ## *-seq data visualization:— #av x:3000 y:3100 z:0 scale:2 .sm .cent
— #hm x:3000 y:3920 z:0 scale:2 .sm .cent
— #osp x:3000 y:3250 z:1500 scale:1
— #ngsp1 x:2550 y:5000 z:0 scale:1.2 .sm .cent Command line tools, e.g. ngsplot
— #logo x:3000 y:1300 scale:4
— #why x:6000 y:-2200 scale:3 .cent
— #why1 x:6000 y:-1600 scale:2 .md Existing solutions did not meet our requirements: - Custom scripts and pargramic languages labraries allows to run things in batch, but are too complicated to run for users without IT expertise - Even with good training these tools requires a lot of time to code - Galaxy/Cistrome was too slow and not configurable enough (plus data privacy problem!)
— #why2 x:6000 y:-1000 scale:2 .cent .md I want take the best from two worlds - connect the intuitiveness and interactiveness of genome browsers with visualization power of plotting 1000s of genomic features at once.
— #why3 x:6000 y:-500 scale:2 .cent .lg— #seqplots x:6000 y:-50 scale:3 .cent ## SeqPlots is this software!
— #spim x:6000 y:900 scale:2 .cent .mdFiles - signal profiles from ChIP-seq experiments: - H3K4me3 (mark active promoters) - H3K36me3 (mark transcribed regions of active genes)
Files - genomic features: - C. elegans transcription start sites (TSS), divided into 5 expression bins based on RNA-seq data
Tasks: - Compare histone marks between highly and lowly expressed genes. - Check if CpG (CG-dinucleotide) occupancy is higher on transcription start sites (TSS) relative to local neighborhood
— #vid x:6000 y:2800 scale:1.8 .lg .cent
— #spaccs x:6000 y:3900 scale:2 .lgThe app is available as:
— #clust x:5900 y:5000 scale:1.5